AITopics | perceptual score

Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard to annotate, and annotations may contain biases that we are often unaware of. Deep-net-based classifiers, in turn, are prone to exploit those biases and to find shortcuts. To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i.e., modalities. Using the perceptual score, we find a surprisingly consistent trend across four popular datasets: recent, more accurate state-of-the-art multi-modal models for visual question-answering or visual dialog tend to perceive the visual data less than their predecessors. This is concerning as answers are hence increasingly inferred from textual cues only. Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions. We hope to spur a discussion on the perceptiveness of multi-modal models and also hope to encourage the community working on multi-modal classifiers to start quantifying perceptiveness via the proposed perceptual score.

data modality, model perceive, perceptual score, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

1680829293f2a8541efa2647a0290f88-Supplemental.pdf

Neural Information Processing SystemsOct-2-2025, 13:41:40 GMT

artificial intelligence, machine learning, scanimate, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings

Wisnu, Dyah A. M. G., Zezario, Ryandhimas E., Rini, Stefano, Wang, Hsin-Min, Tsao, Yu

arXiv.org Artificial IntelligenceSep-4-2025

--We present a system for automatic multi-axis perceptual quality prediction of generative audio, developed for Track 2 of the AudioMOS Challenge 2025. The task is to predict four Audio Aesthetic Scores--Production Quality, Production Complexity, Content Enjoyment, and Content Usefulness--for audio generated by text-to-speech (TTS), text-to-audio (TT A), and text-to-music (TTM) systems. A main challenge is the domain shift between natural training data and synthetic evaluation data. T o address this, we combine BEA Ts, a pretrained transformer-based audio representation model, with a multi-branch long short-term memory (LSTM) predictor and use a triplet loss with buffer-based sampling to structure the embedding space by perceptual similarity. Our results show that this improves embedding discriminability and generalization, enabling domain-robust audio quality assessment without synthetic training data.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.03292

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

b51a15f382ac914391a58850ab343b00-Supplemental.pdf

Neural Information Processing SystemsAug-17-2025, 00:37:17 GMT

artificial intelligence, false question, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

b51a15f382ac914391a58850ab343b00-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 00:37:12 GMT

Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Exploring Gender Bias in Alzheimer's Disease Detection: Insights from Mandarin and Greek Speech Perception

He, Liu, Li, Yuanchao, Feng, Rui, Han, XinRan, Liu, Yin-Long, Yang, Yuwei, Zhu, Zude, Yuan, Jiahong

arXiv.org Artificial IntelligenceJul-17-2025

Gender bias has been widely observed in speech perception tasks, influenced by the fundamental voicing differences between genders. This study reveals a gender bias in the perception of Alzheimer's Disease (AD) speech. In a perception experiment involving 16 Chinese listeners evaluating both Chinese and Greek speech, we identified that male speech was more frequently identified as AD, with this bias being particularly pronounced in Chinese speech. Acoustic analysis showed that shimmer values in male speech were significantly associated with AD perception, while speech portion exhibited a significant negative correlation with AD identification. Although language did not have a significant impact on AD perception, our findings underscore the critical role of gender bias in AD speech perception. This work highlights the necessity of addressing gender bias when developing AD detection models and calls for further research to validate model performance across different linguistic contexts.

artificial intelligence, machine learning, perception, (15 more...)

arXiv.org Artificial Intelligence

2507.12356

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Perceptual Score: What Data Modalities Does Your Model Perceive?

Neural Information Processing SystemsJan-18-2025, 19:40:54 GMT

Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard to annotate, and annotations may contain biases that we are often unaware of. Deep-net-based classifiers, in turn, are prone to exploit those biases and to find shortcuts. To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i.e., modalities.

data modality, model perceive, perceptual score, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Filters

Collaborating Authors

perceptual score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

b51a15f382ac914391a58850ab343b00-Supplemental.pdf

b51a15f382ac914391a58850ab343b00-Paper.pdf

1680829293f2a8541efa2647a0290f88-Supplemental.pdf

Perceptual Score: What Data Modalities Does Your Model Perceive?

1680829293f2a8541efa2647a0290f88-Supplemental.pdf

Improving Perceptual Audio Aesthetic Assessment via Triplet Loss and Self-Supervised Embeddings

b51a15f382ac914391a58850ab343b00-Supplemental.pdf

b51a15f382ac914391a58850ab343b00-Paper.pdf

Exploring Gender Bias in Alzheimer's Disease Detection: Insights from Mandarin and Greek Speech Perception

Perceptual Score: What Data Modalities Does Your Model Perceive?